New England groundfish have been harvested commercially for more than 400 years. However, in the last thirty years, stocks of several important species have declined severely with a few at historically low levels. The Northeast Fisheries Science Center (NEFSC) bottom trawl survey is an essential component for stock assessment and fishery management in New England. However, many stock assessments rely on uncertain estimates of catchability. Commercial fishers’ on the water observations as well as recent studies suggest that the NEFSC bottom trawl survey is not 100% efficient for most of the species it surveys. Differences in catch efficiency and inaccurate estimates of survey catchability can result in added uncertainty for stock assessments, impacting the management process, catch limits, and the economic viability of commercial fisheries. Following other experiments in the region, this study aims to provide estimates of survey efficiency for several species in the multi-species groundfish complex. In the absence of an experimental paired-gear study, an opportunistic approach was used to compare the relative efficiency of survey gears to commercial gears. High resolution fisheries dependent data from the NEFSC’s Study Fleet and Observer programs were leveraged to identify tows that occurred in the relative vicinity and timeframe of NEFSC survey tows. Catch ratios between commercial trawl gears and the NEFSC bottom trawl survey were compared and the relative efficiency for the NEFSC survey was estimated. Relative efficiency estimates can be used to help improve estimates of catchability in the stock assessment process.
A first-cut analysis will use paired t-tests or Wiloxcon sign-rank test to assess differences in mean catch between commercial and survey trawl nets. More advanced analysis will be conducted with generalized liner models (GLM) with a binomial distribution to model and compare catch rates. The response variable will be the survey catch divided by the total catch from the commercial trawl and survey trawl for each paired observation for each species of interest. \[ \frac{C_s}{C_s + C_c} \] Explanatory variables will include trawl net characteristics, environmental conditions and seasonal information. The two commercial datasets were investigated separately because they have different information, for example the observer dataset has more information on the type of trawl gear used and detailed gear characteristics.
#Looking at the cpua for the paired tows for each species
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 73 & SPECIES_ITIS == 164712) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
filter(sf_species_cpua < 10000) %>%
filter(nefsc_species_cpua < 10000) %>%
ggplot() +
geom_bar(aes(x = EFFORT_ID, y = sf_species_cpua, fill = "Study Fleet"), stat = "identity") +
geom_bar(aes(x = EFFORT_ID, y = -nefsc_species_cpua, fill = "NEFSC"), stat = "identity") +
labs(x = "Tow Number", y = CPUA_label,
title = "Comparison of CPUA for Atlantic Cod",
subtitle = "Pairs within 5 mi and 72 hrs") +
scale_fill_manual(name = "Dataset",
values = c("Study Fleet" = "red", "NEFSC" = "blue"),
labels = c("Study Fleet", "NEFSC")) +
theme_classic() +
theme(axis.text.x=element_blank())
#Looking at the relative efficiency of the study fleet tows compared to the nefsc survey
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 73 & SPECIES_ITIS == 164712) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
filter(sf_species_cpua < 10000) %>%
filter(nefsc_species_cpua < 10000) %>%
ggplot() +
geom_histogram(aes(x = relative_efficiency), bins = 50, color = "black", fill = "white") +
geom_vline(aes(xintercept = 0.5),
color = "black", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = mean(relative_efficiency)),
color = "red", linetype = "dashed", size = 1) +
labs(x = "Relative Efficiency", y = "Count",
title = "Histogram of Relative Efficiency for Atlantic Cod",
subtitle = "Pairs within 5 mi and 72 hrs")
#Map of the matched tows
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 73 & SPECIES_ITIS == 164712) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
filter(sf_species_cpua < 10000) %>%
filter(nefsc_species_cpua < 10000) %>%
ggplot() +
geom_sf(data = nefsc_strata, fill = "grey") +
geom_polygon(data = NEUS, aes(x=long, y = lat, group = group), fill = "darkgreen",
color="black",inherit.aes = FALSE) +
geom_point(data = cod_paired_tows_5mi_72hr_re, aes(x = END_TOW_LON.x,y = END_TOW_LAT.x, color = "NEFSC")) +
geom_point(data = cod_paired_tows_5mi_72hr_re, aes(x = END_TOW_LON.y,y = END_TOW_LAT.y, color = "Study Fleet")) +
# geom_segment(data = paired_tows_1mi_24hr, aes(x = START_TOW_LON.x, xend = END_TOW_LON.x,
# y = START_TOW_LAT.x, yend = END_TOW_LAT.x)) +
coord_sf(ylim = c(39, 44), xlim = c(-74, -66)) +
scale_colour_manual("", breaks = c("Study Fleet", "NEFSC"), values = c("red", "blue")) +
labs(title = "Map of Matched Study Fleet and NEFSC Tows for Cod", subtitle = "within 5 mi and 72 hrs")
A comparison of the catch-per-unit area illustrates that the commercial trawls from the study fleet seem to be catching more Cod than the survey trawl for the matched tows. The histogram of the relative efficiency shows that the average relative efficiency is about 0.37, if the survey and commerical trawls were equally efficient it would be 0.5. A map of the matched tows shows that there are three distinct areas where the matched tows occurred: southern New England, Georges Bank, and the Gulf of Maine.
#looking at the boxplot and histograms of the catch per unit area
box.plot(cod_paired_tows_5mi_72hr_re)
hist_kh(cod_paired_tows_5mi_72hr_re)
#log-transforming the data
cod_paired_tows_5mi_72hr_re <- cod_paired_tows_5mi_72hr_re %>%
mutate(diff = sf_species_cpua - nefsc_species_cpua,
log_sf_species_cpua = log(sf_species_cpua),
log_nefsc_species_cpua = log(nefsc_species_cpua),
diff_of_logs = log_sf_species_cpua - log_nefsc_species_cpua)
#looking at the boxplot and histograms of the difference between CPUA
box.plot_diff(cod_paired_tows_5mi_72hr_re)
hist_kh_diff(cod_paired_tows_5mi_72hr_re)
#looking at the boxplot and histograms of the diff_og_logs
box.plot_diff_log(cod_paired_tows_5mi_72hr_re)
hist_kh_diff_log(cod_paired_tows_5mi_72hr_re)
#assumptions of normality and equal variance are met with the log transformed data,
#so the t-test was conducted with the log-transformed data
#running t-test
cod_ttest <- t.test(cod_paired_tows_5mi_72hr_re$log_sf_species_cpua, cod_paired_tows_5mi_72hr_re$log_nefsc_species_cpua, paired = TRUE)
cod_ttest
##
## Paired t-test
##
## data: cod_paired_tows_5mi_72hr_re$log_sf_species_cpua and cod_paired_tows_5mi_72hr_re$log_nefsc_species_cpua
## t = 4.5522, df = 96, p-value = 0.00001556
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.4845498 1.2338666
## sample estimates:
## mean of the differences
## 0.8592082
#back transform the estimate
exp(cod_ttest$estimate)
## mean of the differences
## 2.36129
exp(cod_ttest$conf.int)
## [1] 1.623444 3.434484
## attr(,"conf.level")
## [1] 0.95
The boxplot for the data shows that the variance is not equal and the histogram shows that the catch rates for the commercial and survey nets are not normally distributed. However, because the data are paired, two observations (commercial/survey CPUA) from one study unit (‘paired tow’) the difference in catch-per-unit area needs to be investigated. The difference in catch-per-unit area is not normally distributed and is not symmetric, so a log-transformation was performed. The histograms of the log-transformed data show that they are relatively normally distributed and the boxplot shows relative symmetry, so the analysis was conducted with the log-transformed data.
These data provide substantial evidence that the mean difference between commercial and survey catch-per-unit area is non-zero (two-sided p-value < 0.0001 from a paired t-test). The mean for catch rate for Cod for commercial trawls from the study fleet was estimated to be 2.36 times the mean catch rate for the NEFSC survey (95% confidence interval: 1.62 to 3.43 times).
#scatterplot matrix of all the variables of interest
ggpairs(cod_paired_tows_5mi_72hr_re, columns = c("VESSEL_NAME", "season", "mgmt_area", "AVGDEPTH", "BOTTEMP"),
cardinality_threshold = 17)
#running all the models w every combination of variables and comparing them based on AIC and deviance explained
codm <- fit.models(cod_paired_tows_5mi_72hr_re)
compare.models(codm)
| Model | AIC | deltaAIC | Deviance Explained |
|---|---|---|---|
| relative_efficiency ~ VESSEL_NAME +season + AVGDEPTH | 144 | 23 | 0.24 |
| relative_efficiency ~ VESSEL_NAME +AVGDEPTH | 144 | 23 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +season | 142 | 22 | 0.23 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area | 142 | 22 | 0.29 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area | 142 | 21 | 0.27 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + AVGDEPTH | 142 | 21 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + AVGDEPTH | 141 | 20 | 0.42 |
| relative_efficiency ~ VESSEL_NAME +AVGDEPTH + BOTTEMP | 129 | 9 | 0.30 |
| relative_efficiency ~ VESSEL_NAME +BOTTEMP | 128 | 7 | 0.33 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + BOTTEMP | 127 | 7 | 0.31 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + AVGDEPTH + BOTTEMP | 125 | 5 | 0.32 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + AVGDEPTH + BOTTEMP | 123 | 3 | 0.42 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + BOTTEMP | 122 | 2 | 0.43 |
| relative_efficiency ~ VESSEL_NAME +season + AVGDEPTH + BOTTEMP | 122 | 2 | 0.39 |
| relative_efficiency ~ VESSEL_NAME +season + BOTTEMP | 121 | 0 | 0.44 |
#dropping the NAs in the bottom-temp column
cod_paired_tows_5mi_72hr_re1 <- cod_paired_tows_5mi_72hr_re %>%
drop_na(BOTTEMP)
#storing the optimal model and plotting the diagnostics
codm_optimal <- glm(relative_efficiency ~ VESSEL_NAME + season + BOTTEMP, family = binomial, data = cod_paired_tows_5mi_72hr_re1)
summary(codm_optimal)
##
## Call:
## glm(formula = relative_efficiency ~ VESSEL_NAME + season + BOTTEMP,
## family = binomial, data = cod_paired_tows_5mi_72hr_re1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.99051 -0.38422 -0.07438 0.28222 1.59493
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.38231 4.19793 1.520 0.1284
## VESSEL_NAMEANGELA AND ROSE -1.84122 2.58070 -0.713 0.4756
## VESSEL_NAMEBANTRY BAY -0.57134 1.98281 -0.288 0.7732
## VESSEL_NAMEDEBBIE SUE 2.59255 2.48559 1.043 0.2969
## VESSEL_NAMEELIZABETH & KATHERINE -1.85155 2.96040 -0.625 0.5317
## VESSEL_NAMEELLEN DIANE -1.92680 3.02801 -0.636 0.5246
## VESSEL_NAMEILLUSION 2.94936 2.01506 1.464 0.1433
## VESSEL_NAMEJULIE ANN II 0.48688 2.93234 0.166 0.8681
## VESSEL_NAMELADY JANE -0.01325 2.20066 -0.006 0.9952
## VESSEL_NAMELIGHTNING BAY 1.03748 2.28454 0.454 0.6497
## VESSEL_NAMELISA ANN II -0.48460 2.12686 -0.228 0.8198
## VESSEL_NAMELISA ANN III -1.13499 2.05536 -0.552 0.5808
## VESSEL_NAMEMARY K 0.31465 2.00326 0.157 0.8752
## VESSEL_NAMEMYSTIC -0.99968 2.09555 -0.477 0.6333
## VESSEL_NAMEPROUD MARY -0.25380 1.84218 -0.138 0.8904
## VESSEL_NAMESAO PAULO 2.56159 2.31609 1.106 0.2687
## VESSEL_NAMEVIRGINIA MARISE 1.73632 2.25721 0.769 0.4418
## seasonSpring -4.76016 2.39518 -1.987 0.0469 *
## BOTTEMP -0.56756 0.34343 -1.653 0.0984 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 34.846 on 87 degrees of freedom
## Residual deviance: 20.237 on 69 degrees of freedom
## AIC: 120.62
##
## Number of Fisher Scoring iterations: 5
par(mfrow=c(2,2))
plot(codm_optimal)
#back transform the estimates and CIs
exp(cbind(OR = coef(codm_optimal), confint(codm_optimal)))
## OR 2.5 % 97.5 %
## (Intercept) 591.292459414 0.162293638272 3538678.9619207
## VESSEL_NAMEANGELA AND ROSE 0.158623367 0.000687165966 54.0605767
## VESSEL_NAMEBANTRY BAY 0.564769973 0.011729491708 100.1315050
## VESSEL_NAMEDEBBIE SUE 13.363852717 0.146915769377 5164.4709266
## VESSEL_NAMEELIZABETH & KATHERINE 0.156994226 0.000001201245 55.5718625
## VESSEL_NAMEELLEN DIANE 0.145613112 0.000244491967 129.7574586
## VESSEL_NAMEILLUSION 19.093682355 0.472936473989 3648.7819642
## VESSEL_NAMEJULIE ANN II 1.627229777 0.005327979453 3419.3671994
## VESSEL_NAMELADY JANE 0.986836257 0.014833851599 225.2333466
## VESSEL_NAMELIGHTNING BAY 2.822093906 0.019280503169 671.7331853
## VESSEL_NAMELISA ANN II 0.615941092 0.006935902171 122.8853938
## VESSEL_NAMELISA ANN III 0.321425362 0.003856911038 59.5855919
## VESSEL_NAMEMARY K 1.369783894 0.027815718338 249.8621894
## VESSEL_NAMEMYSTIC 0.367996732 0.006650077343 74.3270091
## VESSEL_NAMEPROUD MARY 0.775843183 0.025269066707 122.6586934
## VESSEL_NAMESAO PAULO 12.956372963 0.172241576370 3764.4715404
## VESSEL_NAMEVIRGINIA MARISE 5.676413856 0.092525554042 1727.4990242
## seasonSpring 0.008564267 0.000052531766 0.7387508
## BOTTEMP 0.566907277 0.273567409672 1.0761265
#look at the component residual plots
crPlots(codm_optimal)
# looking at the effect_plot - these dont work with binomial regression
# effect_plot(codm_optimal, pred = VESSEL_NAME) +
# labs(x = "Vessel Name", y = "Relative Efficiency") +
# coord_flip()
# Extract the fitted and predicted values
cod_paired_tows_5mi_72hr_re1$predict <- predict(codm_optimal)
cod_paired_tows_5mi_72hr_re1$fitted <- fitted(codm_optimal)
#predicted vs observed values
cod_paired_tows_5mi_72hr_re1 %>%
ggplot(aes(x = fitted,
y = relative_efficiency)) +
geom_point() +
geom_abline(intercept=0, slope=1) +
xlim(0,1) +
ylim(0,1) +
labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')
# #predicted vs trawl type w point shape as explanatory variables - not important for this project but saving for my thesis work
# #mgmt_area
# cod_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = mgmt_area)) +
# geom_jitter() +
# labs(x = "Vessel Name", y = "Fitted Values") +
# coord_flip()
#
# #season
# cod_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = season)) +
# geom_jitter() +
# coord_flip()
#
# #depth
# cod_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = BOTTEMP)) +
# geom_jitter() +
# coord_flip()
#getting final RE estimate
cod_paired_tows_5mi_72hr_re1 %>%
group_by(VESSEL_NAME) %>%
summarise(mean(fitted), n = n())
## # A tibble: 17 × 3
## VESSEL_NAME `mean(fitted)` n
## <chr> <dbl> <int>
## 1 ANGELA & ROSE 0.258 2
## 2 ANGELA AND ROSE 0.322 2
## 3 BANTRY BAY 0.369 10
## 4 DEBBIE SUE 0.586 2
## 5 ELIZABETH & KATHERINE 0.0915 2
## 6 ELLEN DIANE 0.514 1
## 7 ILLUSION 0.578 8
## 8 JULIE ANN II 0.658 1
## 9 LADY JANE 0.505 6
## 10 LIGHTNING BAY 0.213 3
## 11 LISA ANN II 0.204 4
## 12 LISA ANN III 0.126 6
## 13 MARY K 0.135 9
## 14 MYSTIC 0.395 21
## 15 PROUD MARY 0.332 7
## 16 SAO PAULO 0.596 2
## 17 VIRGINIA MARISE 0.703 2
cod_paired_tows_5mi_72hr_re1 %>%
summarise(mean(fitted), n = n())
## mean(fitted) n
## 1 0.3600107 88
cod_paired_tows_5mi_72hr_re1 %>%
ggplot(aes(x = VESSEL_NAME,
y = fitted)) +
geom_jitter() +
stat_summary(
geom = "point",
fun = "mean",
col = "black",
size = 3,
shape = 24,
fill = "red"
) +
labs(x = "Vessel Name", y = "Fitted Values of Relative Efficiency") +
geom_hline(yintercept = 0.3600107, linetype = "dashed", color = "red") +
coord_flip()
A scatterplot matrix of all the variables of interest showed no correlation among explanatory variables, but this is difficult to assess given that most of the explanatory variables are categorical.
A backward regression analysis was conducted, however no variables were found to be significant so model selection was based on AIC score and the deviance explained.
The optimal model for relative efficiency for Cod from the Study Fleet dataset included vessel, season, and bottom temperature as explanatory variables. The diagnostic plots show no concerning trends, so the interpretation of results proceeded with this optimal model.
The component residual show a strong negative relationship with bottom temperature and higher relative efficiency in the fall.
The most efficient vessels as compared to the reference vessel, the Angela & Rose, after accounting for season and bottom temperature were the Debbie Sue which had a relative efficiency for Cod 13.36 times higher (95% CI: 0.14 to 5164 times), the Illusion which had a relative efficiency 19.10 times higher (95% CI: 0.47 to 3648 times), and the Sao Paulo which had a relative efficiency 12.96 times higher (95% CI: 0.17 to 3764 times).
The modeled values of relative efficiency show that the most efficient vessels compared to the survey trawl were the Mary K, Lisa Ann III, Lisa Ann II, and the Elizabeth & Katherine. The mean of the relative efficiency of the modeled values was 0.36, which is substantially less than if the commerical and survey trawls had equal efficiency
#Looking at the cpua for the paired tows for each species
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 105 & SPECIES_ITIS == 172909) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
ggplot() +
geom_bar(aes(x = EFFORT_ID, y = sf_species_cpua, fill = "Study Fleet"), stat = "identity") +
geom_bar(aes(x = EFFORT_ID, y = -nefsc_species_cpua, fill = "NEFSC"), stat = "identity") +
labs(x = "Tow Number", y = CPUA_label,
title = "Comparison of CPUA for Yellowtail Flounder",
subtitle = "Pairs within 5 mi and 72 hrs") +
scale_fill_manual(name = "Dataset",
values = c("Study Fleet" = "red", "NEFSC" = "blue"),
labels = c("Study Fleet", "NEFSC")) +
theme_classic() +
theme(axis.text.x=element_blank())
#filter out the outlier tow and retain only tows with less than 10,000 lbs/km2
paired_tows_5mi_72hr_re <- paired_tows_5mi_72hr_re %>%
filter(SVSPP == 105 & SPECIES_ITIS == 172909) %>%
filter(sf_species_cpua < 10000)
#re run that plot without the outlier
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 105 & SPECIES_ITIS == 172909) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
ggplot() +
geom_bar(aes(x = EFFORT_ID, y = sf_species_cpua, fill = "Study Fleet"), stat = "identity") +
geom_bar(aes(x = EFFORT_ID, y = -nefsc_species_cpua, fill = "NEFSC"), stat = "identity") +
labs(x = "Tow Number", y = CPUA_label,
title = "Comparison of CPUA for Yellowtail Flounder",
subtitle = "Pairs within 5 mi and 72 hrs") +
scale_fill_manual(name = "Dataset",
values = c("Study Fleet" = "red", "NEFSC" = "blue"),
labels = c("Study Fleet", "NEFSC")) +
theme_classic() +
theme(axis.text.x=element_blank())
#filtering it out in the yellowtail species dataset
yellowtailfl_paired_tows_5mi_72hr_re <- yellowtailfl_paired_tows_5mi_72hr_re %>%
filter(sf_species_cpua < 10000,
nefsc_species_cpua < 10000)
#Looking at the relative efficiency of the study fleet tows compared to the nefsc survey
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 105 & SPECIES_ITIS == 172909) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
ggplot() +
geom_histogram(aes(x = relative_efficiency), bins = 50, color = "black", fill = "white") +
geom_vline(aes(xintercept = 0.5),
color = "black", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = mean(relative_efficiency)),
color = "red", linetype = "dashed", size = 1) +
labs(x = "Relative Efficiency", y = "Count",
title = "Histogram of Relative Efficiency for Yellowtail Flounder",
subtitle = "Pairs within 5 mi and 72 hrs")
#Map of the matched tows
paired_tows_5mi_72hr_re %>%
filter(SVSPP == 105 & SPECIES_ITIS == 172909) %>%
distinct(EFFORT_ID, .keep_all = TRUE) %>%
ggplot() +
geom_sf(data = nefsc_strata, fill = "grey") +
geom_polygon(data = NEUS, aes(x=long, y = lat, group = group), fill = "darkgreen",
color="black",inherit.aes = FALSE) +
geom_point(data = yellowtailfl_paired_tows_5mi_72hr_re, aes(x = END_TOW_LON.x,y = END_TOW_LAT.x, color = "NEFSC")) +
geom_point(data = yellowtailfl_paired_tows_5mi_72hr_re, aes(x = END_TOW_LON.y,y = END_TOW_LAT.y, color = "Study Fleet")) +
# geom_segment(data = paired_tows_1mi_24hr, aes(x = START_TOW_LON.x, xend = END_TOW_LON.x,
# y = START_TOW_LAT.x, yend = END_TOW_LAT.x)) +
coord_sf(ylim = c(39, 44), xlim = c(-74, -66)) +
scale_colour_manual("", breaks = c("Study Fleet", "NEFSC"), values = c("red", "blue")) +
labs(title = "Map of Matched Study Fleet and NEFSC Tows for Yellowtail Flounder", subtitle = "within 5 mi and 72 hrs")
A comparison of the catch-per-unit area illustrates one large outlier in the Study Fleet dataset. The CPUA was limited to less than 10,000 lbs/km^2. The subsequent plot shows that the commercial trawls from the study fleet seem to catch more yellowtail flounder than the survey trawl in most instances. However, there are some matched tows where the survey trawl outperformed the commercial trawls. The histogram of the relative efficiency shows that the average relative efficiency is slightly lower than if the survey and commercial trawls were equally efficient. A map of the matched tows shows that there are three distinct areas where the matched tows occurred: southern New England, Georges Bank, and the Gulf of Maine. The majority of matched tows are in the Gulf of Maine and the eastern edge of Georges Bank.
#looking at the boxplot and histograms of the catch per unit area
box.plot(yellowtailfl_paired_tows_5mi_72hr_re)
hist_kh(yellowtailfl_paired_tows_5mi_72hr_re)
#log-transforming the data
yellowtailfl_paired_tows_5mi_72hr_re <- yellowtailfl_paired_tows_5mi_72hr_re %>%
mutate(diff = sf_species_cpua - nefsc_species_cpua,
log_sf_species_cpua = log(sf_species_cpua),
log_nefsc_species_cpua = log(nefsc_species_cpua),
diff_of_logs = log_sf_species_cpua - log_nefsc_species_cpua)
#looking at the boxplot and histograms of the difference between CPUA
box.plot_diff(yellowtailfl_paired_tows_5mi_72hr_re)
hist_kh_diff(yellowtailfl_paired_tows_5mi_72hr_re)
#looking at the boxplot and histograms of the diff_og_logs
box.plot_diff_log(yellowtailfl_paired_tows_5mi_72hr_re)
hist_kh_diff_log(yellowtailfl_paired_tows_5mi_72hr_re)
#assumptions of normality and equal variance are met with the log transformed data,
#so the t-test was conducted with the log-transformed data
#assumptions of normality and equal variance are not met with the log transformed data,
#so the t-test was conducted with the log-transformed data
yellowtail_ttest <- t.test(yellowtailfl_paired_tows_5mi_72hr_re$log_sf_species_cpua, yellowtailfl_paired_tows_5mi_72hr_re$log_nefsc_species_cpua,
paired = TRUE)
yellowtail_ttest
##
## Paired t-test
##
## data: yellowtailfl_paired_tows_5mi_72hr_re$log_sf_species_cpua and yellowtailfl_paired_tows_5mi_72hr_re$log_nefsc_species_cpua
## t = 2.3235, df = 119, p-value = 0.02185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.06482849 0.81246080
## sample estimates:
## mean of the differences
## 0.4386446
#back transform the estimate
exp(yellowtail_ttest$estimate)
## mean of the differences
## 1.550604
exp(yellowtail_ttest$conf.int)
## [1] 1.066976 2.253446
## attr(,"conf.level")
## [1] 0.95
The boxplot for the data shows that the variance is relatively equal but the histogram shows that the catch rates for the commercial and survey nets are not normally distributed. However, because the data are paired, two observations (commercial/survey CPUA) from one study unit (‘paired tow’) the difference in catch-per-unit area needs to be investigated. The difference in catch-per-unit area is not normally distributed and is partially symmetric, so a log-transformation was performed. The histograms of the log-transformed data show that they are relatively normally distributed and the boxplot shows relative symmetry, so the analysis was conducted with the log-transformed data.
These data provide moderate evidence that the mean difference between commercial and survey catch rates is non-zero (two-sided p-value = 0.022 from a paired t-test). The mean catch rate for Yellowtail Flounder for commercial trawls from the study fleet was estimated to be 1.55 times the mean catch rate for the NEFSC survey (95% confidence interval: 1.07 to 2.25 times).
#scatterplot matrix of all the variables of interest
ggpairs(yellowtailfl_paired_tows_5mi_72hr_re, columns = c("VESSEL_NAME", "season", "mgmt_area", "AVGDEPTH", "BOTTEMP"),
cardinality_threshold = 17)
#running all the models w every combination of variables and comparing them based on AIC and deviance explained
yellowtailm <- fit.models(yellowtailfl_paired_tows_5mi_72hr_re)
compare.models(yellowtailm)
| Model | AIC | deltaAIC | Deviance Explained |
|---|---|---|---|
| relative_efficiency ~ VESSEL_NAME +mgmt_area | 175 | 34 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +season | 171 | 30 | 0.20 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area | 171 | 30 | 0.41 |
| relative_efficiency ~ VESSEL_NAME +AVGDEPTH | 160 | 19 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + AVGDEPTH | 160 | 19 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +season + BOTTEMP | 159 | 18 | 0.50 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + BOTTEMP | 159 | 18 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +BOTTEMP | 157 | 16 | 0.41 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + BOTTEMP | 157 | 16 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +season + AVGDEPTH | 152 | 11 | 0.49 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + AVGDEPTH | 152 | 11 | 0.50 |
| relative_efficiency ~ VESSEL_NAME +season + AVGDEPTH + BOTTEMP | 142 | 1 | 0.25 |
| relative_efficiency ~ VESSEL_NAME +season + mgmt_area + AVGDEPTH + BOTTEMP | 142 | 1 | 0.51 |
| relative_efficiency ~ VESSEL_NAME +AVGDEPTH + BOTTEMP | 141 | 0 | 0.49 |
| relative_efficiency ~ VESSEL_NAME +mgmt_area + AVGDEPTH + BOTTEMP | 141 | 0 | 0.51 |
#dropping the NAs in the bottom-temp column
yellowtailfl_paired_tows_5mi_72hr_re1 <- yellowtailfl_paired_tows_5mi_72hr_re %>%
drop_na(BOTTEMP, mgmt_area)
#storing the optimal model and plotting the diagnostics
yellowtailm_optimal <- glm(relative_efficiency ~ VESSEL_NAME + AVGDEPTH + BOTTEMP, family = binomial, data = yellowtailfl_paired_tows_5mi_72hr_re1)
summary(yellowtailm_optimal)
##
## Call:
## glm(formula = relative_efficiency ~ VESSEL_NAME + AVGDEPTH +
## BOTTEMP, family = binomial, data = yellowtailfl_paired_tows_5mi_72hr_re1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.93173 -0.36474 -0.02228 0.28077 1.23972
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.93508 1.76256 1.098 0.2723
## VESSEL_NAMEANGELA AND ROSE 1.37423 2.14399 0.641 0.5215
## VESSEL_NAMEBANTRY BAY -0.25105 1.61989 -0.155 0.8768
## VESSEL_NAMEDEBBIE SUE 0.40544 2.55402 0.159 0.8739
## VESSEL_NAMEELIZABETH & KATHERINE -2.04237 1.84042 -1.110 0.2671
## VESSEL_NAMEELLEN DIANE 0.83354 2.19646 0.379 0.7043
## VESSEL_NAMEILLUSION -0.06687 1.57722 -0.042 0.9662
## VESSEL_NAMEJULIE ANN II 0.11646 2.80843 0.041 0.9669
## VESSEL_NAMELADY JANE 0.11676 1.91581 0.061 0.9514
## VESSEL_NAMELISA ANN II 0.56033 1.80611 0.310 0.7564
## VESSEL_NAMELISA ANN III 0.49732 1.58225 0.314 0.7533
## VESSEL_NAMEMARY K -2.89538 3.53056 -0.820 0.4122
## VESSEL_NAMEMYSTIC -0.52740 1.64472 -0.321 0.7485
## VESSEL_NAMEPROUD MARY -0.26941 1.62548 -0.166 0.8684
## VESSEL_NAMESAO PAULO -0.29239 1.58220 -0.185 0.8534
## VESSEL_NAMEVIRGINIA MARISE -0.34158 2.03045 -0.168 0.8664
## AVGDEPTH -0.05070 0.01587 -3.196 0.0014 **
## BOTTEMP 0.16498 0.11091 1.488 0.1369
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 48.496 on 109 degrees of freedom
## Residual deviance: 24.572 on 92 degrees of freedom
## AIC: 140.99
##
## Number of Fisher Scoring iterations: 6
par(mfrow=c(2,2))
plot(yellowtailm_optimal)
#back transform the estimates and CIs
exp(cbind(OR = coef(yellowtailm_optimal), confint(yellowtailm_optimal)))
## OR 2.5 % 97.5 %
## (Intercept) 6.92461496 0.150825275 270.0393085
## VESSEL_NAMEANGELA AND ROSE 3.95201484 0.053810013 440.0986796
## VESSEL_NAMEBANTRY BAY 0.77798670 0.025398014 29.4390012
## VESSEL_NAMEDEBBIE SUE 1.49996738 0.008510348 2195.2586273
## VESSEL_NAMEELIZABETH & KATHERINE 0.12972094 0.002567317 6.2649896
## VESSEL_NAMEELLEN DIANE 2.30145109 0.029961721 296.6017823
## VESSEL_NAMEILLUSION 0.93532026 0.032928541 33.6029977
## VESSEL_NAMEJULIE ANN II 1.12351817 0.005844444 11940.2174807
## VESSEL_NAMELADY JANE 1.12385454 0.023912281 71.3152628
## VESSEL_NAMELISA ANN II 1.75125236 0.044083064 93.9339220
## VESSEL_NAMELISA ANN III 1.64431084 0.057691507 59.7603846
## VESSEL_NAMEMARY K 0.05527796 NA 11.7432698
## VESSEL_NAMEMYSTIC 0.59013757 0.018606496 23.1199089
## VESSEL_NAMEPROUD MARY 0.76383367 0.024491835 29.0103932
## VESSEL_NAMESAO PAULO 0.74647908 0.025931105 26.8710727
## VESSEL_NAMEVIRGINIA MARISE 0.71064626 0.009365556 52.2138927
## AVGDEPTH 0.95056375 0.919355115 0.9790631
## BOTTEMP 1.17937415 0.955484248 1.4881235
#look at the component residual plots
crPlots(yellowtailm_optimal)
# Extract the fitted and predicted values
yellowtailfl_paired_tows_5mi_72hr_re1$predict <- predict(yellowtailm_optimal)
yellowtailfl_paired_tows_5mi_72hr_re1$fitted <- fitted(yellowtailm_optimal)
#predicted vs observed values
yellowtailfl_paired_tows_5mi_72hr_re1 %>%
ggplot(aes(x = fitted,
y = relative_efficiency)) +
geom_point() +
geom_abline(intercept=0, slope=1) +
xlim(0,1) +
ylim(0,1) +
labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')
#predicted vs trawl type w point shape as explanatory variables - not important for this project but saving for my thesis work
# #mgmt_area
# yellowtailfl_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = mgmt_area)) +
# geom_jitter() +
# coord_flip()
#
# #season
# yellowtailfl_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = season)) +
# geom_jitter() +
# coord_flip()
#
# #depth
# yellowtailfl_paired_tows_5mi_72hr_re1 %>%
# ggplot(aes(x = VESSEL_NAME,
# y = fitted, color = AVGDEPTH)) +
# geom_jitter() +
# coord_flip()
#getting final RE estimate
yellowtailfl_paired_tows_5mi_72hr_re1 %>%
group_by(VESSEL_NAME) %>%
summarise(mean(fitted), n = n())
## # A tibble: 16 × 3
## VESSEL_NAME `mean(fitted)` n
## <chr> <dbl> <int>
## 1 ANGELA & ROSE 0.456 2
## 2 ANGELA AND ROSE 0.607 2
## 3 BANTRY BAY 0.402 11
## 4 DEBBIE SUE 0.664 1
## 5 ELIZABETH & KATHERINE 0.244 7
## 6 ELLEN DIANE 0.624 2
## 7 ILLUSION 0.426 15
## 8 JULIE ANN II 0.754 1
## 9 LADY JANE 0.768 6
## 10 LISA ANN II 0.674 4
## 11 LISA ANN III 0.569 9
## 12 MARY K 0.0533 2
## 13 MYSTIC 0.376 24
## 14 PROUD MARY 0.372 7
## 15 SAO PAULO 0.357 15
## 16 VIRGINIA MARISE 0.446 2
yellowtailfl_paired_tows_5mi_72hr_re1 %>%
summarise(mean(fitted), n = n())
## mean(fitted) n
## 1 0.4336806 110
yellowtailfl_paired_tows_5mi_72hr_re1 %>%
ggplot(aes(x = VESSEL_NAME,
y = fitted)) +
geom_jitter() +
stat_summary(
geom = "point",
fun = "mean",
col = "black",
size = 3,
shape = 24,
fill = "red"
) +
labs(x = "Vessel Name", y = "Fitted Values of Relative Efficiency") +
geom_hline(yintercept = 0.4336806, linetype = "dashed", color = "red") +
coord_flip()
A scatterplot matrix of all the variables of interest showed no correlation among explanatory variables, but this is difficult to assess given that most of the explanatory variables are categorical. A backward regression analysis was conducted, however no variables were found to be significant so model selection was based on AIC score and the deviance explained. The optimal model for relative efficiency for yellowtail flounder from the Study Fleet dataset included vessel, management area, depth, and bottom temperature as explanatory variables. However, that model did not produce estimates of the management area effect, so the model with vessel, depth, and bottom temperature was selected. The diagnostic plots show no concerning trends, so the interpretation of results proceeded with this optimal model. There was one potential outlier in the diagnostic plots, but further investigation provided no reasoning to exclude the data point so the analysis proceeded with that model. The component residual show a strong negative relationship with depth and a slightly positive relationship with bottom temperature. The most efficient vessels as compared to the reference vessel, the Angela & Rose, after accounting for depth and bottom temperature were the Ellen Diane which had a relative efficiency for yellowtail 2.30 times higher (95% CI: 0.029 to 296 times), the Angela and Rose which had a relative efficiency 3.95 times higher (95% CI: 0.054 to 440 times), and the Lisa Ann II which had a relative efficiency 1.75 times higher (95% CI: 0.044 to 93.93 times). The modeled values of relative efficiency show that the most efficient vessels compared to the survey trawl were the Mary K, the Elizabeth & Katherine, and the Sao Paulo. The mean of the relative efficiency of the modeled values was 0.43, which is less than if the commercial and survey trawls had equal efficiency
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 73 & NESPP4 == 818) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_bar(aes(x = trip_haul_ID, y = ob_species_cpua, fill = "Observer"), stat = "identity") +
geom_bar(aes(x = trip_haul_ID, y = -nefsc_species_cpua, fill = "NEFSC"), stat = "identity") +
labs(x = "Tow Number", y = CPUA_label,
title = "Comparison of CPUA for Atlantic Cod",
subtitle = "Pairs within 5 mi and 72 hrs") +
scale_fill_manual(name = "Dataset",
values = c("Observer" = "red", "NEFSC" = "blue"),
labels = c("Observer", "NEFSC")) +
theme_classic() +
theme(axis.text.x=element_blank())
#Looking at the relative efficiency of the study fleet tows compared to the nefsc survey
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 73 & NESPP4 == 818) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_histogram(aes(x = relative_efficiency), bins = 100, color = "black", fill = "white") +
geom_vline(aes(xintercept = 0.5),
color = "black", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = mean(relative_efficiency)),
color = "red", linetype = "dashed", size = 1) +
labs(x = "Relative Efficiency", y = "Count",
title = "Histogram of Relative Efficiency for Atlantic Cod",
subtitle = "Pairs within 5 mi and 72 hrs")
#Map of the matched tows
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 73 & NESPP4 == 818) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_sf(data = nefsc_strata, fill = "grey") +
geom_polygon(data = NEUS, aes(x=long, y = lat, group = group), fill = "darkgreen",
color="black",inherit.aes = FALSE) +
geom_point(data = cod_paired_tows_5mi_72hr_ob_re, aes(x = END_TOW_LON.x,y = END_TOW_LAT.x, color = "NEFSC")) +
geom_point(data = cod_paired_tows_5mi_72hr_ob_re, aes(x = END_TOW_LON.y,y = END_TOW_LAT.y, color = "Observer")) +
# geom_segment(data = paired_tows_1mi_24hr, aes(x = START_TOW_LON.x, xend = END_TOW_LON.x,
# y = START_TOW_LAT.x, yend = END_TOW_LAT.x)) +
coord_sf(ylim = c(39, 44), xlim = c(-74, -66)) +
scale_colour_manual("", breaks = c("NEFSC", "Observer"), values = c("blue", "red")) +
labs(title = "Map of Matched Observer and NEFSC Tows for Cod", subtitle = "within 5 mi and 72 hrs")
A comparison of the catch-per-unit area shows that survey trawl is catching more Cod than the commercial trawls from the Observer dataset in most instances. The histogram of the relative efficiency shows that the average relative efficiency is slightly higher than if the survey and commercial trawls were equally efficient. A map of the matched tows shows that the majority of matched tows are in the Gulf of Maine and the northern edge of Georges Bank.
#looking at the boxplot and histograms of the catch per unit area
box.plot_ob(cod_paired_tows_5mi_72hr_ob_re)
hist_kh_ob(cod_paired_tows_5mi_72hr_ob_re)
#log-transforming the data
cod_paired_tows_5mi_72hr_ob_re <- cod_paired_tows_5mi_72hr_ob_re %>%
mutate(diff = ob_species_cpua - nefsc_species_cpua,
log_ob_species_cpua = log(ob_species_cpua),
log_nefsc_species_cpua = log(nefsc_species_cpua),
diff_of_logs = log_ob_species_cpua - log_nefsc_species_cpua)
#looking at the boxplot and histograms of the difference between CPUA
box.plot_diff(cod_paired_tows_5mi_72hr_ob_re)
hist_kh_diff(cod_paired_tows_5mi_72hr_ob_re)
# #looking at the boxplot and histograms of the diff_og_logs
# box.plot_diff_log(cod_paired_tows_5mi_72hr_ob_re)
# hist_kh_diff_log(cod_paired_tows_5mi_72hr_ob_re)
#assumptions of normality and equal variance are met with the data,
#so the t-test was conducted with the raw data
cod_ttest_ob <- t.test(cod_paired_tows_5mi_72hr_ob_re$ob_species_cpua, cod_paired_tows_5mi_72hr_ob_re$nefsc_species_cpua,
paired = TRUE)
cod_ttest_ob
##
## Paired t-test
##
## data: cod_paired_tows_5mi_72hr_ob_re$ob_species_cpua and cod_paired_tows_5mi_72hr_ob_re$nefsc_species_cpua
## t = -2.6322, df = 53, p-value = 0.01109
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -296.55729 -40.05751
## sample estimates:
## mean of the differences
## -168.3074
# #back transform the estimate
# exp(cod_ttest_ob$estimate)
# exp(cod_ttest_ob$conf.int)
The boxplot for the data shows that the variance is not equal and the histogram shows that the catch rates for the commercial and survey nets are not normally distributed. However, because the data are paired, two observations (commercial/survey CPUA) from one study unit (‘paired tow’) the difference in catch-per-unit area needs to be investigated. The difference in catch-per-unit area is normally distributed and is relatively symmetric, so the analysis was conducted on the raw data.
These data provide moderately strong evidence that the mean difference between commercial and survey catch rates is non-zero (two-sided p-value = 0.01 from a paired t-test). The mean catch rate for Cod for commercial fishermen from the observer dataset was estimated to be 168.31 lb/km^2 less than the mean catch rate for the NEFSC survey trawl (95% confidence interval: 40.06 to 296.56 lb/km^2).
#scatterplot matrix of all the variables of interest
ggpairs(cod_paired_tows_5mi_72hr_ob_re, columns = c("trawl_type", "season", "mgmt_area", "AVGDEPTH", "BOTTEMP"))
#running all the models w every combination of variables and comparing them based on AIC and deviance explained
codm <- fit.models_ob(cod_paired_tows_5mi_72hr_ob_re)
compare.models_ob(codm)
| Model | AIC | deltaAIC | Deviance Explained |
|---|---|---|---|
| relative_efficiency ~ trawl_type +season + mgmt_area + BOTTEMP | 82 | 9 | 0.06 |
| relative_efficiency ~ trawl_type +season + BOTTEMP | 81 | 7 | 0.05 |
| relative_efficiency ~ trawl_type +season + mgmt_area | 80 | 7 | 0.16 |
| relative_efficiency ~ trawl_type +mgmt_area + BOTTEMP | 80 | 7 | 0.05 |
| relative_efficiency ~ trawl_type +BOTTEMP | 79 | 5 | 0.06 |
| relative_efficiency ~ trawl_type +season | 78 | 5 | 0.16 |
| relative_efficiency ~ trawl_type +season + mgmt_area + AVGDEPTH + BOTTEMP | 78 | 5 | 0.06 |
| relative_efficiency ~ trawl_type +mgmt_area | 78 | 5 | 0.16 |
| relative_efficiency ~ trawl_type +season + mgmt_area + AVGDEPTH | 77 | 4 | 0.05 |
| relative_efficiency ~ trawl_type +season + AVGDEPTH + BOTTEMP | 77 | 3 | 0.18 |
| relative_efficiency ~ trawl_type +mgmt_area + AVGDEPTH + BOTTEMP | 76 | 3 | 0.16 |
| relative_efficiency ~ trawl_type +mgmt_area + AVGDEPTH | 75 | 2 | 0.07 |
| relative_efficiency ~ trawl_type +season + AVGDEPTH | 75 | 2 | 0.19 |
| relative_efficiency ~ trawl_type +AVGDEPTH + BOTTEMP | 74 | 1 | 0.18 |
| relative_efficiency ~ trawl_type +AVGDEPTH | 73 | 0 | 0.19 |
#storing the optimal model and plotting the diagnostics
codm_ob_optimal <- glm(relative_efficiency ~ trawl_type + AVGDEPTH, family = binomial, data = cod_paired_tows_5mi_72hr_ob_re)
summary(codm_ob_optimal)
##
## Call:
## glm(formula = relative_efficiency ~ trawl_type + AVGDEPTH, family = binomial,
## data = cod_paired_tows_5mi_72hr_ob_re)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4348 -0.3611 0.1784 0.4496 1.0136
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.681908 0.791913 -0.861 0.389
## trawl_type2 0.353189 0.681992 0.518 0.605
## trawl_type3 -0.146080 1.482277 -0.099 0.921
## trawl_type4 -2.013589 2.785924 -0.723 0.470
## AVGDEPTH 0.009004 0.005776 1.559 0.119
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22.828 on 53 degrees of freedom
## Residual deviance: 19.253 on 49 degrees of freedom
## AIC: 73.178
##
## Number of Fisher Scoring iterations: 4
par(mfrow=c(2,2))
plot(codm_ob_optimal)
#back transform the estimates and CIs
exp(cbind(OR = coef(codm_ob_optimal), confint(codm_ob_optimal)))
## OR 2.5 % 97.5 %
## (Intercept) 0.5056511 0.10065700 2.375254
## trawl_type2 1.4236000 0.38287663 5.751907
## trawl_type3 0.8640889 0.02838505 21.985127
## trawl_type4 0.1335087 NA 11.486530
## AVGDEPTH 1.0090449 0.99795541 1.021112
#look at the component residual plots
crPlots(codm_ob_optimal)
# Extract the fitted and predicted values
cod_paired_tows_5mi_72hr_ob_re$predict <- predict(codm_ob_optimal)
cod_paired_tows_5mi_72hr_ob_re$fitted <- fitted(codm_ob_optimal)
#predicted vs observed values
cod_paired_tows_5mi_72hr_ob_re %>%
ggplot(aes(x = fitted,
y = relative_efficiency)) +
geom_point() +
geom_abline(intercept=0, slope=1) +
xlim(0,1) +
ylim(0,1) +
labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')
#predicted vs trawl type w point shape as explanatory variables - not relevant for this project but saving for my thesis work
# #mgmt_area
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = mgmt_area)) +
# geom_jitter()
#
# #season
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = season)) +
# geom_jitter()
#
# #depth
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = AVGDEPTH)) +
# geom_jitter()
#
# #bottom temp
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = BOTTEMP)) +
# geom_jitter() +
# coord_flip()
#
# #ground cable length
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = GRCABLEN)) +
# geom_jitter() +
# coord_flip()
#
# #sweep diameter
# cod_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = SWEEPDIA)) +
# geom_jitter() +
# coord_flip()
#getting final RE estimate
cod_paired_tows_5mi_72hr_ob_re %>%
group_by(trawl_type) %>%
summarise(mean(fitted), n = n())
## # A tibble: 4 × 3
## trawl_type `mean(fitted)` n
## <fct> <dbl> <int>
## 1 1 0.611 37
## 2 2 0.625 14
## 3 3 0.469 2
## 4 4 0.155 1
cod_paired_tows_5mi_72hr_ob_re %>%
summarise(mean(fitted), n = n())
## mean(fitted) n
## 1 0.6005787 54
#plot of trawl type and relative efficiency estimates from the model and the mean for each trawl type
cod_paired_tows_5mi_72hr_ob_re %>%
ggplot(aes(x = trawl_type,
y = fitted)) +
geom_jitter() +
stat_summary(
geom = "point",
fun = "mean",
col = "black",
size = 3,
shape = 24,
fill = "red"
) +
labs(x = "Trawl Type", y = "Fitted Values of Relative Efficiency") +
geom_hline(yintercept = 0.4336806, linetype = "dashed", color = "red") +
scale_x_discrete(labels=c("1" = "Groundfish Trawls", "2" = "Flatfish Trawls",
"3" = "Separator Trawls", "4" = "Eliminator Trawls")) +
coord_flip()
cod_paired_tows_5mi_72hr_ob_re %>%
ggplot(aes(x = AVGDEPTH, y = relative_efficiency, color = trawl_type, pch = trawl_type)) +
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = 'binomial')) +
labs(x = "Average Depth", y = "Relative Efficiency") +
theme_classic()
A scatterplot matrix of all the variables of interest showed no correlation among explanatory variables, but this is difficult to assess given that most of the explanatory variables are categorical. A backward regression analysis was conducted, however no variables were found to be significant so model selection was based on AIC score and the deviance explained. The optimal model for relative efficiency for Cod from the Observer dataset included trawl type and depth. The diagnostic plots show no concerning trends, so the interpretation of results proceeded with this optimal model. There was one potential outlier in the diagnostic plots, but further investigation provided no reasoning to exclude the data point so the analysis proceeded with that model. The component residual show a strong positive relationship with depth. The most efficient trawl type was the flatfish trawl (trawl_type 2) which had a relative efficency 1.42 times greater than the reference trawl type, the groundfish trawl after accounting for depth (95% CI: 0.38 to 5.75). The modeled values for relative efficiency show that the mean relative efficiency was greater than 0.5, illustrating that the survey trawl was more efficient that the commercial gears.
#Looking at the cpua for the paired tows for each species
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 105 & NESPP4 == 1230) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_bar(aes(x = trip_haul_ID, y = ob_species_cpua, fill = "Observer"), stat = "identity") +
geom_bar(aes(x = trip_haul_ID, y = -nefsc_species_cpua, fill = "NEFSC"), stat = "identity") +
labs(x = "Tow Number", y = CPUA_label,
title = "Comparison of CPUA for Yellowtail Flounder",
subtitle = "Pairs within 5 mi and 72 hrs") +
scale_fill_manual(name = "Dataset",
values = c("Observer" = "red", "NEFSC" = "blue"),
labels = c("Observer", "NEFSC")) +
theme_classic() +
theme(axis.text.x=element_blank())
#Looking at the relative efficiency of the study fleet tows compared to the nefsc survey
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 105 & NESPP4 == 1230) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_histogram(aes(x = relative_efficiency), bins = 100, color = "black", fill = "white") +
geom_vline(aes(xintercept = 0.5),
color = "black", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = mean(relative_efficiency)),
color = "red", linetype = "dashed", size = 1) +
labs(x = "Relative Efficiency", y = "Count",
title = "Histogram of Relative Efficiency for Yellowtail Flounder",
subtitle = "Pairs within 5 mi and 72 hrs")
#Map of the matched tows
paired_tows_5mi_72hr_ob_re %>%
filter(SVSPP == 105 & NESPP4 == 1230) %>%
distinct(trip_haul_ID, .keep_all = TRUE) %>%
ggplot() +
geom_sf(data = nefsc_strata, fill = "grey") +
geom_polygon(data = NEUS, aes(x=long, y = lat, group = group), fill = "darkgreen",
color="black",inherit.aes = FALSE) +
geom_point(data = yellowtailfl_paired_tows_5mi_72hr_ob_re, aes(x = END_TOW_LON.x,y = END_TOW_LAT.x, color = "NEFSC")) +
geom_point(data = yellowtailfl_paired_tows_5mi_72hr_ob_re, aes(x = END_TOW_LON.y,y = END_TOW_LAT.y, color = "Observer")) +
# geom_segment(data = paired_tows_1mi_24hr, aes(x = START_TOW_LON.x, xend = END_TOW_LON.x,
# y = START_TOW_LAT.x, yend = END_TOW_LAT.x)) +
coord_sf(ylim = c(39, 44), xlim = c(-74, -66)) +
scale_colour_manual("", breaks = c("NEFSC", "Observer"), values = c("blue", "red")) +
labs(title = "Map of Matched Observer and NEFSC Tows for Yellowtail Flounder", subtitle = "within 5 mi and 72 hrs")
A comparison of the catch-per-unit area shows that survey trawl is catching more yellowtail flounder than the commercial trawls from the Observer dataset in most instances. The histogram of the relative efficiency shows that the average relative efficiency is higher than if the survey and commercial trawls were equally efficient. A map of the matched tows shows that the majority of matched tows are in the Gulf of Maine and the southeastern edge of Georges Bank.
#looking at the boxplot and histograms of the catch per unit area
box.plot_ob(yellowtailfl_paired_tows_5mi_72hr_ob_re)
hist_kh_ob(yellowtailfl_paired_tows_5mi_72hr_ob_re)
#log-transforming the data
yellowtailfl_paired_tows_5mi_72hr_ob_re <- yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
mutate(diff = ob_species_cpua - nefsc_species_cpua,
log_ob_species_cpua = log(ob_species_cpua),
log_nefsc_species_cpua = log(nefsc_species_cpua),
diff_of_logs = log_ob_species_cpua - log_nefsc_species_cpua)
#looking at the boxplot and histograms of the difference between CPUA
box.plot_diff(yellowtailfl_paired_tows_5mi_72hr_ob_re)
hist_kh_diff(yellowtailfl_paired_tows_5mi_72hr_ob_re)
#looking at the boxplot and histograms of the diff_og_logs
box.plot_diff_log(yellowtailfl_paired_tows_5mi_72hr_ob_re)
hist_kh_diff_log(yellowtailfl_paired_tows_5mi_72hr_ob_re)
#assumptions of normality and equal variance are met with the log transformed data,
#so the t-test was conducted with the log-transformed data
yellowtail_ttest_ob <- t.test(yellowtailfl_paired_tows_5mi_72hr_ob_re$log_ob_species_cpua, yellowtailfl_paired_tows_5mi_72hr_ob_re$log_nefsc_species_cpua,
paired = TRUE)
yellowtail_ttest_ob
##
## Paired t-test
##
## data: yellowtailfl_paired_tows_5mi_72hr_ob_re$log_ob_species_cpua and yellowtailfl_paired_tows_5mi_72hr_ob_re$log_nefsc_species_cpua
## t = -3.1917, df = 50, p-value = 0.002445
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.1536280 -0.2624755
## sample estimates:
## mean of the differences
## -0.7080517
#back transform the estimate
exp(yellowtail_ttest_ob$estimate)
## mean of the differences
## 0.492603
exp(yellowtail_ttest_ob$conf.int)
## [1] 0.3154901 0.7691452
## attr(,"conf.level")
## [1] 0.95
#get the back transformed estimate into a percentage
1 - exp(yellowtail_ttest_ob$estimate)
## mean of the differences
## 0.507397
1 - exp(yellowtail_ttest_ob$conf.int)
## [1] 0.6845099 0.2308548
## attr(,"conf.level")
## [1] 0.95
The boxplot for the data shows that the variance is relatively equal but the histogram shows that the catch rates for the commercial and survey nets are not normally distributed. However, because the data are paired, two observations (commercial/survey CPUA) from one study unit (‘paired tow’) the difference in catch-per-unit area needs to be investigated. The difference in catch-per-unit area is not normally distributed and is only somewhat symmetric, so a log-transformation was performed. The histograms of the log-transformed data show that they are relatively normally distributed and the boxplot shows relative symmetry, so the analysis was conducted with the log-transformed data.
These data provide strong evidence that the mean difference between commercial and survey catch rates is non-zero (two-sided p-value = 0.0024 from a paired t-test). The mean catch rate for Yellowtail Flounder for commercial trawls from the observer dataset was estimated to be 51% less than the mean catch rate for the NEFSC survey (95% confidence interval: 23% to 68%).
#scatterplot matrix of all the variables of interest
ggpairs(yellowtailfl_paired_tows_5mi_72hr_ob_re, columns = c("trawl_type", "season", "mgmt_area", "AVGDEPTH", "BOTTEMP"))
#running all the models w every combination of variables and comparing them based on AIC and deviance explained
yellowtailm_ob <- fit.models_ob(yellowtailfl_paired_tows_5mi_72hr_ob_re)
compare.models_ob(yellowtailm_ob)
| Model | AIC | deltaAIC | Deviance Explained |
|---|---|---|---|
| relative_efficiency ~ trawl_type +mgmt_area + AVGDEPTH | 76 | 7 | 0.26 |
| relative_efficiency ~ trawl_type +season + mgmt_area + AVGDEPTH + BOTTEMP | 76 | 7 | 0.11 |
| relative_efficiency ~ trawl_type +mgmt_area | 74 | 5 | 0.10 |
| relative_efficiency ~ trawl_type +mgmt_area + AVGDEPTH + BOTTEMP | 74 | 5 | 0.12 |
| relative_efficiency ~ trawl_type +season + mgmt_area + BOTTEMP | 74 | 5 | 0.28 |
| relative_efficiency ~ trawl_type +season + mgmt_area + AVGDEPTH | 74 | 5 | 0.26 |
| relative_efficiency ~ trawl_type +season + AVGDEPTH + BOTTEMP | 73 | 4 | 0.26 |
| relative_efficiency ~ trawl_type +AVGDEPTH | 73 | 4 | 0.13 |
| relative_efficiency ~ trawl_type +mgmt_area + BOTTEMP | 72 | 3 | 0.19 |
| relative_efficiency ~ trawl_type +season + mgmt_area | 72 | 3 | 0.15 |
| relative_efficiency ~ trawl_type +AVGDEPTH + BOTTEMP | 72 | 3 | 0.29 |
| relative_efficiency ~ trawl_type +season + BOTTEMP | 71 | 2 | 0.28 |
| relative_efficiency ~ trawl_type +season + AVGDEPTH | 71 | 2 | 0.26 |
| relative_efficiency ~ trawl_type +BOTTEMP | 71 | 2 | 0.20 |
| relative_efficiency ~ trawl_type +season | 69 | 0 | 0.29 |
#storing the optimal model and plotting the diagnostics
yellowtailm_ob_optimal <- glm(relative_efficiency ~ trawl_type + season, family = binomial, data = yellowtailfl_paired_tows_5mi_72hr_ob_re)
summary(yellowtailm_ob_optimal)
##
## Call:
## glm(formula = relative_efficiency ~ trawl_type + season, family = binomial,
## data = yellowtailfl_paired_tows_5mi_72hr_ob_re)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9129 -0.2755 0.1122 0.3819 0.8650
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.2829 0.5770 2.223 0.0262 *
## trawl_type2 -0.2699 0.6562 -0.411 0.6809
## trawl_type3 -0.5475 1.5426 -0.355 0.7227
## trawl_type4 -1.3349 1.0646 -1.254 0.2099
## seasonSpring -1.0825 0.6541 -1.655 0.0979 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14.805 on 50 degrees of freedom
## Residual deviance: 10.999 on 46 degrees of freedom
## AIC: 69.124
##
## Number of Fisher Scoring iterations: 4
par(mfrow=c(2,2))
plot(yellowtailm_ob_optimal)
#back transform the estimates and CIs
exp(cbind(OR = coef(yellowtailm_ob_optimal), confint(yellowtailm_ob_optimal)))
## OR 2.5 % 97.5 %
## (Intercept) 3.6069661 1.25307006 12.557712
## trawl_type2 0.7634736 0.20784580 2.803400
## trawl_type3 0.5784082 0.02141001 19.889140
## trawl_type4 0.2631811 0.02928074 2.210882
## seasonSpring 0.3387476 0.08807172 1.182124
#look at the component residual plots
crPlots(yellowtailm_ob_optimal)
# Extract the fitted and predicted values
yellowtailfl_paired_tows_5mi_72hr_ob_re$predict <- predict(yellowtailm_ob_optimal)
yellowtailfl_paired_tows_5mi_72hr_ob_re$fitted <- fitted(yellowtailm_ob_optimal)
#predicted vs observed values
yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
ggplot(aes(x = fitted,
y = relative_efficiency)) +
geom_point() +
geom_abline(intercept=0, slope=1) +
xlim(0,1) +
labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')
#predicted vs trawl type w point shape as explanatory variables - not important for this project but saving for my thesis work
# #mgmt_area
# yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = mgmt_area)) +
# geom_jitter()
#
# #season
# yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = season)) +
# geom_jitter()
#
# #depth
# yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
# ggplot(aes(x = trawl_type,
# y = fitted, color = AVGDEPTH)) +
# geom_jitter()
#getting final RE estimate
yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
group_by(trawl_type) %>%
summarise(mean(fitted), n = n())
## # A tibble: 4 × 3
## trawl_type `mean(fitted)` n
## <fct> <dbl> <int>
## 1 1 0.682 23
## 2 2 0.578 21
## 3 3 0.545 2
## 4 4 0.487 5
yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
summarise(mean(fitted), n = n())
## mean(fitted) n
## 1 0.6146174 51
yellowtailfl_paired_tows_5mi_72hr_ob_re %>%
ggplot(aes(x = trawl_type,
y = fitted)) +
geom_jitter() +
stat_summary(
geom = "point",
fun = "mean",
col = "black",
size = 3,
shape = 24,
fill = "red"
) +
labs(x = "Trawl Type", y = "Fitted Values of Relative Efficiency") +
geom_hline(yintercept = 0.6146174, linetype = "dashed", color = "red") +
scale_x_discrete(labels=c("1" = "Groundfish Trawls", "2" = "Flatfish Trawls",
"3" = "Separator Trawls", "4" = "Eliminator Trawls")) +
coord_flip()
A scatterplot matrix of all the variables of interest showed no correlation among explanatory variables, but this is difficult to assess given that most of the explanatory variables are categorical. A backward regression analysis was conducted, however no variables were found to be significant so model selection was based on AIC score and the deviance explained. The optimal model for relative efficiency for yellowtail flounder from the Observer dataset included trawl type and season. The diagnostic plots show some patterns of grouping, but that is believed to be due to the model only containing categorical variables, so the interpretation of results proceeded with this optimal model. The component residual show greater relative efficiency in the fall. The most efficient trawl type was the reference trawl (trawl_type 1), the groundfish trawl. The modeled values for relative efficiency show that the mean relative efficiency was greater than 0.5, illustrating that the survey trawl was more efficient that the commercial gears.